-
Notifications
You must be signed in to change notification settings - Fork 2.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create a lossy index of resource -> ClassPathElement mapping #42525
Create a lossy index of resource -> ClassPathElement mapping #42525
Conversation
d09424c
to
19ce97b
Compare
8619e6d
to
34631a0
Compare
builder.addParentFirstElement(element); | ||
builder.addNormalPriorityElement(element); | ||
} else if (dep.isFlagSet(DependencyFlags.CLASSLOADER_LESSER_PRIORITY)) { | ||
builder.addLesserPriorityElement(element); | ||
} else { | ||
builder.addNormalPriorityElement(element); | ||
} | ||
builder.addElement(element); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The semantic has changed a bit here: we used to store all the elements in elements
and then do some logic to figure out if they were in lesserPriorityElements
when building the index.
I chose to simplify things and have a clearer semantic.
Thus why the logic has changed here and also why I renamed the method.
* We try to be clever as to how we build the resource key to reduce the number of misses. It might need further tuning in the | ||
* future. | ||
*/ | ||
public class ClassPathResourceIndex { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The removal of ClassLoaderState
and the addition of this class is the whole purpose of this patch. The rest is really to be able to make this happen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might just be the fact that it's my first day back from PTO, but just by reading the Javadoc, I don't understand how this index is supposed to work. I do understand what problem it tries to solve vs. previous approach, but I don't understand how it does it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does it by storing a prefix instead of the full class name. A prefix that is deemed to be smart enough to not cause a penalty.
As the ClassPathElement
contains a list of the resources, we can go through the index to reduce greatly the number of elments to go through and then make sure the resource is there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's a bit more complex than that as you need to keep full references for some elements (see the sets with local classes, parent first resources...) but that's the general idea of it.
I don't mind adding some javadoc, just tell me if this explanation is good enough for you.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sounds like information that should be in the Javadoc 😉
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind adding some javadoc, just tell me if this explanation is good enough for you.
It's better for sure, but my fried brain is still having trouble grasping this. Who knows if it will recover or not...
public List<ClassPathElement> getAllElements(boolean onlyFromCurrentClassLoader) { | ||
List<ClassPathElement> ret = new ArrayList<>(); | ||
if (parent instanceof QuarkusClassLoader && !onlyFromCurrentClassLoader) { | ||
ret.addAll(((QuarkusClassLoader) parent).getAllElements(onlyFromCurrentClassLoader)); | ||
} | ||
ret.addAll(elements); | ||
return ret; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method wasn't used so I dropped it.
private final List<ClassPathElement> normalPriorityElements; | ||
private final List<ClassPathElement> lesserPriorityElements; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As explained in https://github.com/quarkusio/quarkus/pull/42525/files#r1719580032, I changed the semantic a bit here.
public List<ClassPathElement> getElementsWithResource(String name, boolean localOnly) { | ||
ensureOpen(name); | ||
|
||
boolean parentFirst = parentFirst(name, getClassPathResourceIndex()); | ||
|
||
List<ClassPathElement> ret = new ArrayList<>(); | ||
if (parent instanceof QuarkusClassLoader && !localOnly) { | ||
|
||
if (parentFirst && !localOnly && parent instanceof QuarkusClassLoader) { | ||
ret.addAll(((QuarkusClassLoader) parent).getElementsWithResource(name)); | ||
} | ||
ClassPathElement[] classPathElements = getState().loadableResources.get(name); | ||
if (classPathElements == null) { | ||
return ret; | ||
|
||
List<ClassPathElement> classPathElements = getClassPathResourceIndex().getClassPathElements(name); | ||
ret.addAll(classPathElements); | ||
|
||
if (!parentFirst && !localOnly && parent instanceof QuarkusClassLoader) { | ||
ret.addAll(((QuarkusClassLoader) parent).getElementsWithResource(name)); | ||
} | ||
ret.addAll(Arrays.asList(classPathElements)); | ||
|
||
return ret; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The behavior of this method was extremely odd as it was always returning the parent resources first. I fixed it to honor parent first in a subsequent commit.
I'll have a good look at this when I'm back (in 10 days or so) |
add0e8f
to
c7eaf65
Compare
This is ready for review, it passed CI on my fork a few times. |
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like very work, but I do want some clarification on the how the index works before looking even closer :)
/** | ||
* Whether the PathTree provides local resources. | ||
* <p> | ||
* For instance a directory or a file provides local resources. A jar does not. | ||
*/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really understand what local resources are
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Local resources are resources that are local to your deployment. A jar is an external dependency so it's not local.
This is a preexisting notion as the QuarkusClassLoader
provides the local resources to the tests: see QuarkusClassLoader#getLocalClassNames()
in the existing code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still sounds confusing to me, but if we are already using that terminology, then 🤷🏽
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind changing the name to something that makes more sense to you - I really reused the terminology that was there.
Ideally, let's do that once this patch is in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be something like isPathsVisibleOutside()
instead to indicate whether the paths visited while processing this tree are visible outside the context of the visit
method.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isPathsVisibleOutside has an obscure semantic
Local is a term from a higher (classloading) layer. It doesn't apply well to PathTree
abstraction. isPathsValidOutside()
is exactly what this is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that I have to use visit is really an implementation detail
It's more than an implementation detail - it's an essential design details and a contract to process content of a PathTree
. There is really no other way of doing that besides "visiting". So if it's redesign one day, it will be a different design and implementation.
I'm just trying to know if the classes will be subject to change or not.
This notion does not exist on the PathTree
level though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@gsmet is your concern about OpenContainerPathTree
? Would isArchiveOrigin()
be better for you?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, I don't think it is, especially because of the semantic in MultiRootPathTree
but I pushed the change as I'm a bit tired of arguing.
My personal opinion is that the semantic I chose was good and maybe the name needed some love. Maybe providesApplicationClasses
to be on par with QuarkusClassLoader#isApplicationClass()
as it's really the same semantic.
But I don't think the name was that bad once you understand the semantic.
I spent a lot of time on this patch and I would really appreciate if we could take a decision and merge it.
Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that if you don't understand the semantic, we might better have a quick call so that I explain it to you.
* We try to be clever as to how we build the resource key to reduce the number of misses. It might need further tuning in the | ||
* future. | ||
*/ | ||
public class ClassPathResourceIndex { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might just be the fact that it's my first day back from PTO, but just by reading the Javadoc, I don't understand how this index is supposed to work. I do understand what problem it tries to solve vs. previous approach, but I don't understand how it does it.
...s/bootstrap/core/src/main/java/io/quarkus/bootstrap/classloading/ClassPathResourceIndex.java
Outdated
Show resolved
Hide resolved
|
||
private static URL getClassPathElementResourceUrl(List<ClassPathElement> classPathElements, String name, | ||
boolean endsWithTrailingSlash) { | ||
for (ClassPathElement classPathElement : classPathElements) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use a good old for(i=0...)
here in order to cut down on allocations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure we could but I don't think it's worse than it used to be. So ideally, let's do that in a follow-up PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
public List<String> getLocalClassNames() { | ||
public Set<String> getLocalClassNames() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this change of type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because that's the semantic we have. A big random bag of unique strings :).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but what do we gain by changing it to a Set
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The values are coming from Sets now so I decided to be consistent rather than going from a Set to a List, especially since that's the semantic we want.
final ClassLoader ccl = Thread.currentThread().getContextClassLoader(); | ||
if (!(ccl instanceof QuarkusClassLoader)) { | ||
throw new IllegalStateException("The current classloader is not an instance of " | ||
+ QuarkusClassLoader.class.getName() + " but " + ccl.getClass().getName()); | ||
} | ||
|
||
String resourceName = fromClassNameToResourceName(className); | ||
List<ClassPathElement> res = getElements(resourceName, true); | ||
return !res.isEmpty(); | ||
ClassPathResourceIndex classPathResourceIndex = ((QuarkusClassLoader) ccl).getClassPathResourceIndex(); | ||
|
||
return classPathResourceIndex.getFirstClassPathElement(resourceName) != null; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be simplified a tad by using pattern matching with instanceof
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not. I don't mind adding some cuteness but I would like to get this patch in and avoid iterating on cosmetic details. I don't think the code is less readable than it used to be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I'm just mentioning it because the whole point of the commit was to simplify the method
@@ -239,8 +246,8 @@ public Enumeration<URL> getResources(String unsanitisedName, boolean parentAlrea | |||
//TODO: in theory resources could have been added in dev mode | |||
//but I don't think this really matters for this code path | |||
Set<URL> resources = new LinkedHashSet<>(); | |||
ClassPathElement[] providers = state.loadableResources.get(name); | |||
if (providers != null) { | |||
List<ClassPathElement> providers = classPathResourceIndex.getClassPathElements(name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change will also cause an extra allocation the way we iterate over the list
Totally agree |
Map<String, ClassPathElement[]> compactedResourceMapping = new HashMap<>(resourceMapping.size()); | ||
for (Entry<String, List<ClassPathElement>> resourceMappingEntry : resourceMapping.entrySet()) { | ||
compactedResourceMapping.put(resourceMappingEntry.getKey(), | ||
resourceMappingEntry.getValue().toArray(new ClassPathElement[resourceMappingEntry.getValue().size()])); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
resourceMappingEntry.getValue().toArray(new ClassPathElement[resourceMappingEntry.getValue().size()])); | |
resourceMappingEntry.getValue().toArray(new ClassPathElement[0])); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All I really want really at this point is to have the Javadoc of ClassPathResourceIndex
improved.
Othen than that, all good for me!
c7eaf65
to
40b9076
Compare
@geoand see the additional commits. The only change compared to what we discussed is the removal of the Now we just have to pray I didn't break anything with these new changes. I didn't some sanity checks but let's see what CI has to say. |
} | ||
} | ||
return Collections.unmodifiableList(classPathElements); | ||
return classPathElements; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think given it's a hot path and not really user-consumable APIs, we could get rid of this safety.
PR looks good to me. The additional commits seem like overkill to me, but I don't care enough about that kind of thing to fight it :) |
@geoand didn’t you ask about these changes this morning? (At least the for loops and the pattern matching) |
Right, what I meant is that having those changes as additional commits seems like overkill to me - my bad for not clarifying |
Oh I wanted to make it easy for you to review them. I had plans to squash some of the commits (if not too much of a mess). Also I wanted a full CI run green before that to make sure I didn’t break my patch. |
🙏 |
This comment has been minimized.
This comment has been minimized.
40b9076
to
97879a3
Compare
I squashed the commits that were just tiny improvements on top of the rest and kept the ones with a specific semantic that I want to keep around. |
97879a3
to
e44bd90
Compare
*/ | ||
boolean providesLocalResources(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My comments were about PathTree
API only. This is a layer on top of it where PathTree
is an impl detail and so here the methods could reflect the meaning that's relevant in this context.
This comment has been minimized.
This comment has been minimized.
The size of ClassLoaderState can be extremely problematic so the idea of this patch is to have a lossy index instead and try to find a good compromise between speed and memory usage.
Previous implementation: Warmup Iteration 7: 63202.300 ops/ms Warmup Iteration 8: 64074.043 ops/ms Warmup Iteration 9: 63297.222 ops/ms Warmup Iteration 10: 63905.908 ops/ms New implementation: Warmup Iteration 6: 234089.005 ops/ms Warmup Iteration 7: 234372.742 ops/ms Warmup Iteration 8: 235722.737 ops/ms Warmup Iteration 9: 233265.148 ops/ms Also added a test to make sure we don't break it.
For whatever reason, we weren't honoring parentFirst in this method which looks like an oversight and un undesirable behavior.
With the new index, we don't need that anymore. I tested that the test introduced in this commit is still working fine (LiquibaseExtensionMigrateAtStartDirectoryChangeLogTest). Note that I fixed it as I forgot to consider the less priority elements there and I went for fixing it and then was wondering why this would be even useful. I also introduced a shortcut for empty resource.
Make sure we use the proper vocabulary at each level and make sure the class loader API is consistently using reloadable instead of local. Per discussion with Alexey.
e44bd90
to
dfa1455
Compare
@aloubyansky I rewrote the last commit as per our discussion. I think we should be all good now but I would appreciate some extra scrutiny. Thanks! |
Status for workflow
|
Note
This PR is part of a coordinated effort to address the OOM reported in #42471:
The size of the
ClassLoaderState
of theQuarkusClassLoader
can be extremely problematic (I have seen it regularly at 10+ MB in the various heap dumps I studied these past months).I made a first attempt at fixing it a few months ago but wasn't really happy with the approach. I approached the problem differently here.
The idea of this patch is to have a lossy index instead and try to find a good compromise between speed and memory usage.
And that's the only compromise we make as each
ClassPathElement
has a comprehensive index of what it contains so we can narrow down the candidates using the lossy index and then fully check the fewClassPathElement
s candidates.I personally think it makes things a bit more readable too but YMMV.
This is related to issue #42471 .
In this particular example:
state
is 13 MB+So we end up with 10 less MB per
QuarkusClassLoader
in this case (obviously this will depend on the number of classes in your app).As for speed, I made a few tests and things are not noticeably slower when starting an application in dev mode from what I could see. I made the implementation of
ClassPathResourceIndex#getResourceKey()
faster in a follow up commit.One thing that I noticed (but it's preexisting) is that I saw this call appear a few times in the flamegraph:
https://github.com/quarkusio/quarkus/blob/main/independent-projects/bootstrap/core/src/main/java/io/quarkus/bootstrap/classloading/PathTreeClassPathElement.java#L105-L107
When calling this from the CL we could bypass this check as we know the resource is there. But I'm not sure how to make it evolve from an API perspective. /cc @aloubyansky
This is better reviewed commit per commit as I added a few more things.
Situation before this patch:
Situation after this patch: